Search CORE

1,315 research outputs found

Advances in Natural Language Generation:Generating Varied Outputs from Semantic Inputs

Author: Castro Ferreira Thiago
Publication venue: Ipskamp
Publication date: 01/01/2018
Field of study

NeuralREG: An end-to-end approach to referring expression generation

Author: Ferreira Thiago Castro
Krahmer Emiel
Kádár Ákos
Moussallem Diego
Wubben Sander
Publication venue
Publication date: 01/01/2018
Field of study

Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201

arXiv.org e-Print Archive

Crossref

Tilburg University Repository

Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

Author: Emmery Chris
Ferreira Thiago Castro
Krahmer Emiel
van der Lee Chris
Wiltshire Travis
Publication venue
Publication date: 14/07/2022
Field of study

This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach. Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.Comment: 22 pages (excluding bibliography and appendix

arXiv.org e-Print Archive

Tilburg University Repository

The Third Multilingual Surface Realisation Shared Task (SR’20):Overview and Evaluation Results

Author: Anya Belz
Bernd Bohnet
Leo Wanner
Simon Mille
Thiago Castro Ferreira
Yvette Graham
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 10/08/2020
Field of study

This paper presents results from the Third Shared Task on Multilingual Surface Realisation (SR’20) which was organised as part of the COLING’20 Workshop on Multilingual Surface Realisation. As in SR’18 and SR’19, the shared task comprised two tracks: (1) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (2) a Deep Track where additionally, functional words and morphological information were removed. Moreover, each track had two subtracks: (a) restricted-resource, where only the data provided or approved as part of a track could be used for training models, and (b) open-resource, where any data could be used. The Shallow Track was offered in 11 languages, whereas the Deep Track in 3 ones. Systems were evaluated using both automatic metrics and direct assessment by human evaluators in terms of Readability and Meaning Similarity to reference outputs. We present the evaluation results, along with descriptions of the SR’19 tracks, data and evaluation methods, as well as brief summaries of the participating systems. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume

University of Brighton Research Portal

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Human-machine Cooperation Protocol for Machine Translation Output Edit Annotation

Author: Castro Ferreira Thiago
de Almeida Costa Felipe
Meira Jr., Wagner
S. Pagano Adriana
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2021
Field of study

We report on a study exploring automatic edit annotation in a post-editing corpus with a new method for computing edit types. We examine edit type association with quality scores assigned to the machine translation output and the post-edited texts. Finally, we account for shortcomings in our method and point out edit types worth leveraging.Presentem un estudi que explora la detecció automàtica d'errors en un corpus de postedició amb un mètode inèdit per calcular tipus d'edició. Examinem la seva associació amb les puntuacions de qualitat assignades a la producció de traducció automàtica i als textos posteditats. Finalment, expliquem les deficiències del nostre mètode i assenyalem els tipus d'edició que val la pena aprofitar.Presentamos un estudio que explora la detección automática de errores en un corpus de posedición con un método novedoso para computar los diferentes tipos de corrección. Examinamos su asociación con la puntuación asignada a la calidad de los resultados de la traducción automática y de los textos poseditados. Por último, analizamos algunos defectos de nuestro método y destacamos los tipos de correcciones que conviene aprovechar

Diposit Digital de Documents de la UAB

Another PASS: a reproduction study of the human evaluation of a football report generation system

Author: Belz Anya
Castro Ferreira Thiago
Davis Brian
Mille Simon
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/05/2021
Field of study

This paper reports results from a reproduction study in which we repeated the human evaluation of the PASS Dutch-language football report generation system (van der Lee et al., 2017). The work was carried out as part of the ReproGen Shared Task on Reproducibility of Human Evaluations in NLG, in Track A (Paper 1). We aimed to repeat the original study exactly, with the main difference that a different set of evaluators was used. We describe the study design, present the results from the original and the reproduction study, and then compare and analyse the differences between the two sets of results. For the two ‘headline’ results of average Fluency and Clarity, we find that in both studies, the system was rated more highly for Clarity than for Fluency, and Clarity had higher standard deviation. Clarity and Fluency ratings were higher, and their standard deviations lower, in the reproduction study than in the original study by substantial margins. Clarity had a higher degree of reproducibility than Fluency, as measured by the coefficient of variation. Data and code are publicly available

DCU Online Research Access Service

MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

Author: Bryl Volha
Brümmer Martin
Consoli Sergio
Cucerzan Silviu
Devi Pooja
Erp Marieke Van
Ferreira Thiago Castro
Hoffart Johannes
Juan
Luo Gang
Nuzzolese Andrea-Giovanni
Röder Michael
Steinmetz Nadine
van Erp Marieke
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

arXiv.org e-Print Archive

Crossref